Ego and Alter Coding Conventions
Matthew Chandler
NetSense
November 4, 2015
These are the conventions I used to code all egos and alters that appear in the NetSense data I cleaned and de-identified. These codes are consistent across the data files I have used, such that, for example, each study participant always has the same five-digit code whenever they appear in the demographic data, network survey data, or behavioral (communication events) data. Note that if a study participant appears as an alter in the behavioral or network survey data, the same consistent code is used. All other codes are similarly consistent across data files.
The codes were generated by random assignment. I used all available identifying information to match and disambiguate egos and alters across all data files.
(See the file “NetSense Data Manipulations.nb” for details.)
Three Digit
Known invalid nodes (eg, voicemail, Twitter SMS alert service)
Four Digit
Suspicious but not known invalid nodes (eg, unknown short codes, very long numbers)
Five Digit
Study participants
10000-89999: most participants
90000-99999: participants marked for exclusion because of incomplete data
(See Participant Canon for details.)
Six Digit
Non-participants
100000-199999: non-participant alters named in the network surveys
200000-899999: non-participant alters not named in the network surveys
900000-999999: reserved for exclusions, if needed